AITopics | coordinate frame

Collaborating Authors

coordinate frame

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Latent Field Discovery In Interacting Dynamical Systems With Neural Fields

Neural Information Processing SystemsFeb-12-2026, 20:47:28 GMT

Systems of interacting objects often evolve under the influence of field effects that govern their dynamics, yet previous works have abstracted away from such effects, and assume that systems evolve in a vacuum. In this work, we focus on discovering these fields, and infer them from the observed dynamics alone, without directly observing them.

machine learning, natural language, neural field, (19 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)

Add feedback

Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation

Konstantinidis, Fabian, Sackmann, Moritz, Hofmann, Ulrich, Stiller, Christoph

arXiv.org Artificial IntelligenceDec-11-2025

Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individual traffic participants. To improve efficiency, we adopt an instance-centric scene representation, where each traffic participant and map element is modeled in its own local coordinate frame. This design enables efficient, viewpoint-invariant scene encoding and allows static map tokens to be reused across simulation steps. To model interactions, we employ a query-centric symmetric context encoder with relative positional encodings between local frames. We use Adversarial Inverse Reinforcement Learning to learn the behavior model and propose an adaptive reward transformation that automatically balances robustness and realism during training. Experiments demonstrate that our approach scales efficiently with the number of tokens, significantly reducing training and inference times, while outperforming several agent-centric baselines in terms of positional accuracy and robustness.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2512.05812

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Transportation > Ground > Road (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

PPL: Point Cloud Supervised Proprioceptive Locomotion Reinforcement Learning for Legged Robots in Crawl Spaces

Ma, Bida, Xu, Nuo, Qi, Chenkun, Liu, Xin, Mo, Yule, Wang, Jinkai, Lu, Chunpeng

arXiv.org Artificial IntelligenceDec-5-2025

--Legged locomotion in constrained spaces (called crawl spaces) is challenging. In crawl spaces, current proprioceptive locomotion learning methods are difficult to achieve traverse because only ground features are inferred. In this study, a point cloud supervis ed RL framework for proprioceptive locomotion in crawl spaces is proposed . A state estimation network is designed to estimate the robot's collision states as well as ground and spatial features for locomotion . A point cloud feature extraction method is proposed to supervise the state estimation network . The method uses representation of the point cloud in polar coordinate frame and MLP s for efficient feature extracti on. Experiments demonstrate that, compared with existing methods, our method exhibits faster iteration time in the training and more agile locomotion in crawl spaces. This study enhances the ability of leg ged robots to traverse constrained spaces w ithout requiring exteroceptive sensors. N recent years, legged robots have demonstrated remarkable terrain traversal capabilities, exhibiting significant application value.

artificial intelligence, locomotion, robot, (15 more...)

arXiv.org Artificial Intelligence

2508.0995

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.89)

Add feedback

METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model

Fu, Yankai, Chen, Ning, Zhao, Junkai, Shan, Shaozhe, Yao, Guocai, Wang, Pengwei, Wang, Zhongyuan, Zhang, Shanghang

arXiv.org Artificial IntelligenceNov-24-2025

Building a generalist robot that can perceive, reason, and act across diverse tasks remains an open challenge, especially for dexterous manipulation. A major bottleneck lies in the scarcity of large-scale, action-annotated data for dexterous skills, as teleoperation is difficult and costly. Human data, with its vast scale and diverse manipulation behaviors, provides rich priors for learning robotic actions. While prior works have explored leveraging human demonstrations, they are often constrained by limited scenarios and a large visual gap between human and robots. To eliminate these limitations, we propose METIS, a vision-language-action (VLA) model for dexterous manipulation pretrained on multi-source egocentric datasets. We first construct EgoAtlas, which integrates large-scale human and robotic data from multiple sources, all unified under a consistent action space. We further extract motion-aware dynamics, a compact and discretized motion representation, which provides efficient and expressive supervision for VLA training. Built upon them, METIS integrates reasoning and acting into a unified framework, enabling effective deployment to downstream dexterous manipulation tasks. Our method demonstrates exceptional dexterous manipulation capabilities, achieving highest average success rate in six real-world tasks. Experimental results also highlight the superior generalization and robustness to out-of-distribution scenarios. These findings emphasize METIS as a promising step toward a generalist model for dexterous manipulation.

artificial intelligence, arxiv preprint arxiv, manipulation, (16 more...)

arXiv.org Artificial Intelligence

2511.17366

Country:

North America > Montserrat (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

Unsupervised learning of object frames by dense equivariant image labelling

Neural Information Processing SystemsNov-21-2025, 16:04:02 GMT

One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame. This coordinate frame is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates. We demonstrate the applicability of this method to simple articulated objects and deformable objects such as human faces, learning embeddings from random synthetic transformations or optical flow correspondences, all without any manual supervision.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Unsupervised learning of object frames by dense equivariant image labelling

James Thewlis, Hakan Bilen, Andrea Vedaldi

Neural Information Processing SystemsNov-21-2025, 12:57:00 GMT

Humans can easily construct mental models of complex 3D objects and object categories from visual observations. This is remarkable because the dependency between an object's appearance

machine learning, object-oriented architecture, proc, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

7886b89aced4d37dd25a6f32854bf3f9-Paper-Conference.pdf

Neural Information Processing SystemsNov-19-2025, 17:43:11 GMT

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

3DP3: 3D Scene Perception via Probabilistic Programming

Neural Information Processing SystemsNov-14-2025, 03:06:46 GMT

We present 3DP3, a framework for inverse graphics that uses inference in a structured generative model of objects, scenes, and images.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country: Asia > Singapore (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(3 more...)

Add feedback

Expanded methods

Neural Information Processing SystemsNov-13-2025, 22:22:35 GMT

The graphical model of DGP is summarized in Figure 1 . First let's define the potential function Now let's define the Gaussian bump. We will write everything in vector form hereafter. We want to "let the data speak" and avoid oversmoothing, so the penalty weights Given the approximate posterior (eq. To understand the various terms in the ELBO above it is helpful to start with a simpler special case.

artificial intelligence, coordinate frame, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

A translation invariance

Neural Information Processing SystemsNov-13-2025, 21:16:48 GMT

In 2 dimensions, we use eq. Simplified rotations In 2 dimensions, the computations can be simplified since rotations commute. Thus, we wrap the computed angle difference so that it always belongs in that range. Furthermore, in all cases that angles are not used geometrically ( e.g. for rotations), we In 3 dimensions, the computation of rotation matrices is more involved than the 2D case. As explained in section 2.1, input trajectories are described by the states In the following equations, we remove time indices to reduce clutter.

artificial intelligence, dimension, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback